Functional near-infrared spectroscopy-based diagnosis support system for distinguishing between mild and severe depression using machine learning approaches

Abstract. Significance Early diagnosis of depression is crucial for effective treatment. Our study utilizes functional near-infrared spectroscopy (fNIRS) and machine learning to accurately classify mild and severe depression, providing an objective auxiliary diagnostic tool for mental health workers. Aim Develop prediction models to distinguish between severe and mild depression using fNIRS data. Approach We collected the fNIRS data from 140 subjects and applied a complete ensemble empirical mode decomposition with an adaptive noise-wavelet threshold combined denoising method (CEEMDAN-WPT) to remove noise during the verbal fluency task. The temporal features (TF) and correlation features (CF) from 18 prefrontal lobe channels of subjects were extracted as predictors. Using recursive feature elimination with cross-validation, we identified optimal TF or CF and examined their role in distinguishing between severe and mild depression. Machine learning algorithms were used for classification. Results The combination of TF and CF as inputs for the prediction model yielded higher classification accuracy than using either TF or CF alone. Among the prediction models, the SVM-based model demonstrates excellent performance in nested cross-validation, achieving an accuracy rate of 92.8%. Conclusions The proposed model can effectively distinguish mild depression from severe depression.


Objective
The objective of this experiment was to assess the performance of the model presented in the main text in differentiating mild depression from healthy controls using functional near-infrared spectroscopy (fNIRS) data.

Participants
Due to the limited availability of healthy volunteers, our control group comprised only 16 individuals.All participants provided appropriate consent and were fully informed about the study.None of the participants had any other neurological conditions (such as stroke, brain tumors, severe concussion, or migraines) or cardiovascular diseases (such as myocardial infarction or arrhythmias).Additionally, their auditory and visual functions were normal.Their healthy condition were evaluated using the Hamilton Depression Rating Scale (HAMD) and assessed by psychiatric professionals.To mitigate the impact of a small sample size, we randomly selected subsets of fNIRS data from both mild depression patients and healthy controls for the model training process.As a result, our dataset consisted of 56 participants (40 with mild depression and 16 healthy controls).More details of the participants are shown in Table 1.

Data Collection and Pre-processing
The fNIRS data were collected from participants during the verbal fluency task (VFT).The data collection and pre-processing procedure were consistent with those described in the main study.

Feature Extraction
In the main study, our model was employed to extract a set of 35 optimal features for further analysis.These features were carefully selected based on their relevance and significance in differentiating between mild and severe depression.This set of features was also utilized in comparative experiments between individuals with mild depression and a control group of healthy individuals.The selection of these features was grounded in their ability to effectively distinguish varying degrees of depression severity.Their inclusion in the comparative experiments allowed us to assess the differences in brain activity patterns between individuals with mild depression and those without any depressive symptoms.

Model Evaluation
To evaluate the performance of our model in classifying depressive status, we combined the fNIRS data from both groups (mild depression and healthy controls).The dataset was randomly divided into training and testing sets, with a ratio of 6:4.This ensured a robust evaluation of the model's effectiveness in accurately classifying the depressive status of the participants.Specifically, we utilized the training set to train the model on the combined fNIRS data, allowing it to learn the patterns and characteristics associated with different depressive statuses.Once trained, the model was then tested on the independent testing set to assess its classification performance.The metrics used to assess the performance of this model are the same as the model metrics in the previous main text, including AUC, specificity, sensitivity, accuracy, and F1 score.By comparing these metrics, we can measure the effectiveness of our model in accurately identifying the severity of depression.

Results
We evaluated the time features, correlation features, and the fusion of both as inputs for four classification models, and validated the results on the test set, as shown in Table 2.The fusion feature, when used as input, demonstrated excellent performance on both SVM and MLP algorithms.This indicates that our model can effectively distinguish between the healthy control group and individuals with mild depression, further validating its effectiveness in assessing the severity of depression.Our model is primarily designed to differentiate the severity of depression, demonstrating high accuracy in distinguishing between severe and mild depression.In this supplementary experiment, despite the limited number of participants in the healthy control group, preliminary results indicate the model's ability to effectively differentiate mild depression from healthy individuals using fNIRS data.However, it is important to note that further research with a larger sample size is essential to validate these initial findings and enhance the reliability of the model.To comprehensively evaluate our model, future studies should consider expanding the sample size to include more healthy participants to ensure robust and consistent results.Additionally, we need to consider other clinical factors that may influence the severity of depression to better interpret our predictive outcomes.

The temporal features
The temporal features used in our study include Maximum, Minimum, Mean, Rectification Average, Skewness, Peak Factors, Mean Squared Frequency, Power Spectral Entropy, and Singular Spectral.These features provide important insights into the temporal characteristics of the data.
Table 3 The explanations and formulas of temporal features

Feature Description
Mathematical formulas

Maximum
Maximum refers to the highest value obtained from the data collected by each channel during the 60-second VFT experiment.

Minimum
Minimum refers to the lowest value of data collected by each channel during the 60second recording period.

Mean
Mean refers to the average of the data collected from each channel during the 60-second fNIRS data recording.

Rectification average
Rectification average refers to calculating the average value of all points after taking their absolute values.

Skewness
Skewness is a numerical feature used to measure the degree of asymmetry in a given data distribution, which reflects the direction and degree of skewness in the data distribution.

Peak factors
The peak factor is the ratio of the peak to the root mean square of a given data, used to measure the extreme degree of the peak value of the data.In the formula,   represents the power spectral entropy, and   represents the energy division of the signal in the frequency domain.  is the singular spectral entropy and   is the singular value spectrum obtained by the singular value decomposition of the signal.

Table 1 :
Subject information of our data

Table 4 :
Table 4 below, the model metrics (model metrics) are analyzed and presented in Table 5. Confusion matrix

Table 5 :
Classification model metrics

Table 6 :
Results of one-sample t-testBold indicates(p<0.05)channels that are not activated.

Table 7 :
Results of paired t-testBold indicates(p<0.05)channels where there is no significant difference in activation strength.